BIOST 561: using the command-line and the server

Lecture 7

Announcements

Note 1 about HW3: Using expect_true()

set.seed(0)
vec1 <- stats::rpois(5, lambda = 1)
set.seed(0)
vec2 <- stats::rpois(5, lambda = 1)
set.seed(1)
vec3 <- stats::rpois(5, lambda = 1)

vec1
## [1] 2 0 1 1 2
vec2
## [1] 2 0 1 1 2
vec3
## [1] 0 1 1 2 0
testthat::expect_true(vec1 == vec2)
## Error: vec1 == vec2 is not TRUE
## 
## `actual`:   TRUE TRUE TRUE TRUE TRUE
## `expected`: TRUE
testthat::expect_true(all(vec1 == vec2))
testthat::expect_true(any(vec1 != vec3))

Note 2 about HW3: Imports: vs. Suggests:

Note 3 about HW3: An annoying utils warning

Many of you had a warning when you ran devtools::check() that reads something like this:

Undefined global function or variables:
  combn
Consider adding
  importFrom("utils", "combn")

To fix this:

Note 4 about HW3: What did I mean “no correct solution”?

Debugging

Two important principles on debugging: Reproducing errors

One: Find how to reproduce your errors

Two important principles on debugging: Tracing

Two: Tracing your code to find the specific line that fails

median_random_rowSums <- function(mat, 
                                  trials = 1000){
  p <- ncol(mat)
  rowsum_vec <- sapply(1:trials, function(trial){
    bool_vec <- stats::rbinom(p, size = 1, prob = 0.5)
    idx <- which(bool_vec == 1)
    mat_tmp <- mat[,idx]
    return(rowSums(mat_tmp))
  })
  
  stats::median(rowsum_vec)
}

mat <- matrix(1:25, nrow = 5, ncol = 5)
median_random_rowSums(mat)
## Error in rowSums(mat_tmp): 'x' must be an array of at least two dimensions

Two ways to do tracing

Non-interactive:

median_random_rowSums <- function(mat, 
                                  trials = 1000){
  p <- ncol(mat)
  rowsum_vec <- sapply(1:trials, function(trial){
    print(paste0("Trial: ", trial))
    bool_vec <- stats::rbinom(p, size = 1, prob = 0.5)
    idx <- which(bool_vec == 1)
    print(idx)
    mat_tmp <- mat[,idx]
    vec <- rowSums(mat_tmp)
    print(vec)
    return(vec)
  })
  
  stats::median(rowsum_vec)
}

set.seed(0) # to reproduce my errors!
mat <- matrix(1:25, nrow = 5, ncol = 5)
median_random_rowSums(mat)
## [1] "Trial: 1"
## [1] 1 4 5
## [1] 38 41 44 47 50
## [1] "Trial: 2"
## [1] 2 3 4 5
## [1] 54 58 62 66 70
## [1] "Trial: 3"
## [1] 4
## Error in rowSums(mat_tmp): 'x' must be an array of at least two dimensions

Interactive:

NOTE ABOUT OPERATING SYSTEMS FOR THIS LECTURE

Opening the terminal (Macs)

Opening the terminal (Windows)

Alternatively 1:

Alternatively 2:

Using Git from the terminal

Moving on: Interacting with the Biostat server

Logging into the server

It should look something like this:

Moving files to/from the server

Two ways:

In-class preparation (Part 1):

In-class preparation (Part 2):

Submitting a job on the server

Okay, so what exactly did you put into your UWBiost561 package that’s now on Bayes as well?

There are multiple moving parts:

Looking at demo_bayes.R (nothing too special)

# store some useful information
date_of_run <- Sys.time()
session_info <- devtools::session_info()
set.seed(10)

# generate a random matrix
p <- 2000
mat <- matrix(rnorm(p^2), p, p)
mat <- mat + t(mat)

# print out some elements of the matrix
print(mat[1:5,1:5])

# compute eigenvalues
res <- eigen(mat)

# save the results
save(mat, res,
     date_of_run, session_info,
     file = "~/demo_bayes_output.RData")

print("Done! :)")

What on earth is a .slurm script?

What does a typical .slurm script look like?

#!/bin/bash

#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g

#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb

R CMD BATCH --no-save --no-restore demo_bayes.R
  • This is what a SLURM script looks like.
  • It specifies a bunch of different things
  • Realistically, you make a new SLURM script (via copy-paste of a previous one) and just change a few things. (I personally never memorized how to write a SLURM script from scratch)

What does a typical .slurm script look like?

#!/bin/bash

#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g

#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb

R CMD BATCH --no-save --no-restore demo_bayes.R
  • Your SLURM script must must must always start with this line.
  • (Don’t ask why. This is one of those things where you do without questioning it. If you must know, you can google “hashbang”)

What does a typical .slurm script look like?

#!/bin/bash

#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g

#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb

R CMD BATCH --no-save --no-restore demo_bayes.R
  • This is the most important line in the SLURM script
  • R CMD BATCH is the command-line function to run a .R file
  • The flags --no-save --no-restore are optional. (It makes your life just a bit easier, so might as well keep them around.)
  • The argument is demo_bayes.R, which is the .R file you wish to run
  • That is, this file (demo_bayes.slurm) is telling Bayes: “Hey, I wish to run the script demo_bayes.R

What does a typical .slurm script look like?

#!/bin/bash

#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g

#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb

R CMD BATCH --no-save --no-restore demo_bayes.R
  • This is the name of your job
  • The flag is --job-name and the value I’m setting it to is demo
  • (More on this later)

What does a typical .slurm script look like?

#!/bin/bash

#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g

#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb

R CMD BATCH --no-save --no-restore demo_bayes.R
  • This is the account you’re using to run your job
  • The flag is --account and the value I’m setting it to is biostat
  • You will never change this when you’re using Bayes (since you’re a Biostat student running your job on the Biostat server), so don’t worry about this. Just keep it here and never touch it

What does a typical .slurm script look like?

#!/bin/bash

#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g

#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb

R CMD BATCH --no-save --no-restore demo_bayes.R
  • This is the partition your running your job on
  • The flag is --partition and the value I’m setting it to is students-12c128g
  • This is student partition on Bayes. Only students (you guys) can use it. I (Kevin) cannot use it.
  • (More on this later – you can change to another partition if you think the current partition you’re using is too busy)

What does a typical .slurm script look like?

#!/bin/bash

#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g

#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb

R CMD BATCH --no-save --no-restore demo_bayes.R
  • After the R CMD BATCH line, this is the next most important line in the entire SLURM script
  • The flags are --time and --mem-per-cpu, and the values I’m setting them to are 12:00:00 and 10gb
  • This tells Bayes: “Hey Bayes, I think my job will need at most 12 hours in time and at most 10 Gigabytes of memory
  • This helps Bayes figure out how to allocate resources for your job
  • (More on this next week, when we talk about server etiquette. Remember, Bayes is for the entire department, not just you!)

In-class, running your first SLURM script on the server

Hm… about that interactive R session

Question: Why didn’t we just run our demo_bayes.R script in this interactive R session (albeit it not having a fancy GUI like R Studio).

Answer: More on this next week when we talk about server etiquette!!

A few notes about .slurm scripts

Some practical advice about editting code (that might contradict my actual advice outside this course)

Grapichal explanation

Grapichal explanation

Grapichal explanation

What’s to come in HW4

With the remaining time…